Creating Customer Segments

Data Exploration

Criteria	Meet Specification
Question 1 Selecting Samples	Three separate samples of the data are chosen and their establishment representations are proposed based on the statistical description of the dataset.
Question 2 Feature Relevance	A prediction score for the removed feature is accurately reported. Justification is made for whether the removed feature is relevant.
Question 3 Feature Distributions	Student identifies features that are correlated and compares these features to the predicted feature. Student further discusses the data distribution for those features.

Data Preprocessing

Criteria	Meet Specification
Feature Scaling	Feature scaling for both the data and the sample data has been properly implemented in code.
Question 4 Outlier Detection	Student identifies extreme outliers and discusses whether the outliers should be removed. Justification is made for any data points removed.

Feature Transformation

Criteria	Meet Specification
Question 5 Principal Component Analysis	The total variance explained for two and four dimensions of the data from PCA is accurately reported. The first four dimensions are interpreted as a representation of customer spending with justification.
Dimensionality Reduction	PCA has been properly implemented and applied to both the scaled data and scaled sample data for the two-dimensional case in code.

Clustering

Criteria	Meet Specification
Question 6 Clustering Algorithm	The Gaussian Mixture Model and K-Means algorithms have been compared in detail. Student’s choice of algorithm is justified based on the characteristics of the algorithm and data.
Question 7 Creating Clusters	Several silhouette scores are accurately reported, and the optimal number of clusters is chosen based on the best reported score. The cluster visualization provided produces the optimal number of clusters based on the clustering algorithm chosen.
Question 8 Data Recovery	The establishments represented by each customer segment are proposed based on the statistical description of the dataset. The inverse transformation and inverse scaling has been properly implemented and applied to the cluster centers in code.
Question 9 Sample Predictions	Sample points are correctly identified by customer segment, and the predicted cluster for each sample point is discussed.

Conclusion

Criteria	Meet Specification
Question 10 A/B Test	Student correctly identifies how an A/B test can be performed on customers after a change in the wholesale distributor’s service.
Question 11 Predicting Additional Data	Student discusses with justification how the clustering data can be used in a supervised learner for new predictions.
Question 12 Comparing Customer Data	Comparison is made between customer segments and customer ‘Channel’ data. Discussion of customer segments being identified by ‘Channel’ data is provided, including whether this representation is consistent with previous results.